Bulldozer (microarchitecture)
Bulldozer is the codename Advanced Micro Devices (AMD) has given to one of the CPU cores based on the AMD family 15h microarchitecture, successor to the family 10 h (K10) microarchitecture for the company's M-SPACE design methodology, with the core specifically aimed at 10-watt to 125-watt TDP computing products. Bulldozer is designed from scratch, not a development of earlier processors.[1] AMD claims dramatic performance-per-watt efficiency improvements in high-performance computing (HPC) applications with Bulldozer cores. Processors with the Bulldozer core for desktop computers were released on October 12, 2011.
The Bulldozer cores support most of the instruction sets implemented by Intel processors available at its introduction (including SSE4.1, SSE4.2, AES, CLMUL, and AVX) as well as future instruction sets proposed by AMD (XOP and FMA4).[2][3]
Basic description
According to AMD, Bulldozer-based CPUs are based on GlobalFoundries' 32 nm Silicon on insulator (SOI) process technology and utilize a new approach to multithreaded computer performance that, according to press notes, "balances dedicated and shared computer resources to provide a highly compact, high core count design that is easily replicated on a chip for performance scaling."[4] In other words, by eliminating some of the redundancies that naturally creep into multicore designs, AMD hoped to take better advantage of its hardware capabilities, while using less power.
Bulldozer-based implementations built on 32nm SOI with HKMG arrived in October 2011 for both servers and desktops. The server segment included the dual chip 16-core Opteron processor codenamed Interlagos (for Socket G34) and single chip 4–8 core Valencia (for Socket C32), while the 4–8 core Zambezi targeted desktops on Socket AM3+.[5][6]
Bulldozer is the first major redesign of AMD’s processor architecture since 2003, when the firm launched its Athlon 64/Opteron (K8) processors, and also features two 128-bit FMA-capable FPUs which can be combined into one 256-bit FPU. This design is accompanied by two integer cores each with 4 pipelines (the fetch/decode stage is shared). Bulldozer will also introduce shared L2 cache in the new architecture. AMD calls this design a "Bulldozer module". A 16-core processor design would feature eight of these modules,[7] but the operating system will recognize each module as two physical cores.
The module, described as two cores, can be contrasted with a single Intel core with HyperThreading. The difference between the two approaches is that Bulldozer provides dedicated schedulers and integer units for each thread, whereas in Intel's core all threads must compete for available execution resources.
Architecture
Bulldozer Module
- AMD has introduced a new microarchitecture building block called module. In terms of hardware complexity and functionality, a module is midway between a dual-core processor (in which each core is fully independent) and a single processor core that has two SMT threads (in which each thread shares most of the hardware resources with the other thread).
- A module consists of two tightly coupled, "conventional" x86 out-of-order processing engines. The processing engine shares the early pipeline stages (eg. instruction fetch, decode), the FPUs, and the L2 cache with the sibling in the module.
- Each module has the following independent hardware resources[8][9]:
- up to 2048 kB L2 cache per module (shared between the cores in a module)
- 16 kB four-way L1 data cache (way-predicted) per core and two-way 64 kB L1 instruction cache per module, one way for each of the two cores[10][11][12]
- Two dedicated integer cores
- each consists of two ALU and two AGU which are capable for total of 4 independent arithmetic and memory operations per clock per core
- duplicating integer schedulers and execution pipelines offers dedicated hardware to each of two threads which significantly increase performance in multithreaded integer applications
- second integer core increases Bulldozer module die by around 12%, which at chip level adds about 5% of total die space[13]
- Two symmetrical 128-bit FMAC (fused multiply–add capability) floating-point pipelines per module that can be unified into one large 256-bit-wide unit if one of integer cores dispatch AVX instruction and two symmetrical x87/MMX/SSE capable FPPs for backward compatibility with SSE2 non-optimized software
- Multiple modules share an L3 cache as well as an Advanced Dual-Channel Memory Sub-System (IMC - Integrated Memory Controller).
- A module has 213 million transistors in an area of 30.9 mm² (including 2 MB L2 cache) on an Orochi die[14]
- A dual-core Bulldozer processor has a single module, a quad-core processor has two modules and an octo-core processor has four modules.
Instruction set extensions
- Support for Intel's Advanced Vector Extensions (AVX) instruction set, which supports 256-Bit floating point operations, and SSE4.1, SSE4.2, AES, CLMUL, as well as future 128-bit instruction sets proposed by AMD (XOP, FMA4 and CVT16),[15] which have the same functionality as the SSE5 instruction set formerly proposed by AMD, but with compatibility to the AVX coding scheme.
Process technology and clock frequency
- 11-metal layer 32 nm SOI process with implemented first generation GlobalFoundries' High-K Metal Gate (HKMG)
- Turbo Core performance boost to increase clock frequency by 500 MHz with all cores active (for most workloads) and further, as TDP headroom permits[16]
- The chip operates at 0.8 to 1.3 V, achieving clock frequencies of 3.5 GHz or more[14]
- Min-Max power usage - 10 to 125 watts
Cache and memory interface
- Up to 8 MB of L3 cache shared among all modules on the same silicon die (8MB per 4 Modules, 16MB per 8 Modules and so on)(16 MB for dual-die MCM), divided into four subcaches of 2 MB each, capable of operating at 2.4 GHz or more at 1.1 V[14]
- Native DDR3-1866 memory support[17]
- Dual Channel DDR3 integrated memory controller (support for PC3-15000 (DDR3-1866)) for Desktop, Quad Channel DDR3 Integrated Memory Controller (support for PC-12800 (DDR3-1600) and Registered DDR3)[18] for Server/Workstation (New Opteron Valencia and Interlagos)
I/O and socket interface
- Hyper Transport Technology rev. 3.1 (3.20 GHz, 6.4 GT/s, 25.6 GB/s, 16-bit uplink/16-bit downlink) [first implemented into HY-D1 revision "Magny-Cours" on the socket G34 Opteron platform in March 2010 and "Lisbon" on the socket C32 Opteron platform in June 2010]
- Socket AM3+ (AM3b)
- 942pin, DDR3 support
- will retain backward compatibility with Socket AM3 motherboards (as per motherboard manufacturer choice and if BIOS updates are provided[19][20]), however this not officially supported by AMD; AM3+ motherboards will be backward-compatible with AM3 processors[21].
- For the server segment, the existing socket G34 (LGA1974) and socket C32 (LGA1207) will be used.
Processors
The first revenue shipments of Bulldozer-based Opteron processors was announced on September 7, 2011.[22] The FX-4100, FX-6100, FX-8120 and FX-8150 were released towards the end of 2011; AMD said that the remaining FX series AMD processors would be released at the end of the first quarter of 2012.
The expected Zambezi parts are summarized in the table below:
Model |
FX-8170 |
FX-8150 |
FX-8120 |
FX-8100 |
FX-6200 |
FX-6120 |
FX-6100 |
FX-4170 |
FX-4150 |
FX-4120 |
FX-4100 |
Code Name |
ZAMBEZI |
Integer Cores / Modules |
8/4 |
6/3 |
4/2 |
TDP |
125W |
125W/95W |
125W |
95W |
125 W |
95W |
Normal Freq. |
3.9 GHz |
3.6 GHz |
3.1 GHz |
2.8 GHz |
3.8 GHz |
3.6 GHz |
3.3 GHz |
4.2 GHz |
3.8 GHz |
3.9 GHz |
3.6 GHz |
Full-Load Freq. (Turbo) |
4.2 GHz |
3.9 GHz |
3.4 GHz |
3.1 GHz |
4.0 GHz |
3.9 GHz |
3.6 GHz |
4.2 GHz |
3.9 GHz |
4.0 GHz |
3.7 GHz |
Half-Load Freq. (Turbo) |
4.5 GHz |
4.2 GHz |
4.0 GHz |
3.7 GHz |
4.1 GHz |
4.2 GHz |
3.9 GHz |
4.3 GHz |
4.0 GHz |
4.1 GHz |
3.8 GHz |
L2 Cache |
8MB |
6MB |
4MB |
L3 Cache |
8MB |
Memory |
DDR3 >1866 MHz |
Unlocked |
Yes |
No |
Yes |
Turbo Core 2.0 |
Yes |
Socket |
AM3+ |
Process Technology |
32nm HkmG SOI |
Major Source : CPU-World [23]
AMD plans two series of Bulldozer based processors for servers: Opteron 4200 series (code named Valencia, with up to 8 cores) and Opteron 6200 series (code named Interlagos, with up to 16 cores).[24]
"FX" Release
On 12 October 2011, AMD released the first four FX-series processors of the Bulldozer line (FX-8150, FX-8120, FX-6100, FX-4100) and lifted their NDA on official reviews.[25]
The first Bulldozer CPUs were met with a mixed response. It was discovered that the FX-8150 performed poorly in benchmarks that were not highly threaded, falling behind the second-generation Intel Core i* series processors and being matched or even outperformed by AMD's own Phenom II X6 at lower clock speeds. In highly threaded benchmarks performance varied: the FX-8150 performed anywhere from on par with the Phenom II X6, to slightly better than the Intel Core i7 2600K, depending on the benchmark. Given the overall more consistent performance of the Intel Core i5 2500K at a lower price, these results left many reviewers underwhelmed. The i5 2500K also outperforms most of the previous generation of i7 CPUs, which the high-end Bulldozer CPUs are more comparable to. The processor was found to be extremely power-hungry under load, especially when overclocked, compared to Intel's Sandy Bridge.[26]
The Tom's Hardware website commented that the lower-than-expected performance in multi-threaded workloads may be because of the way Windows 7 currently schedules threads to the cores. They point out that "if Windows were able to utilize an FX-8150's four modules first, and then backfill each module's second core, it'd maximize performance with up to four threads running concurrently." This is similar to what happens on Intel CPUs with HyperThreading – Windows 7 "schedules to physical cores before utilizing logical (HyperThreaded) cores."[27]
Overclocking was found to improve performance, but increase power draw significantly.[28]
On 13 October, AMD stated on its blog that "there are some in our community who feel the product performance did not meet their expectations", but showed bencharmarks on actual applications where it outperformed "Sandy Bridge i7 2600k" and "AMD X6 1100T".[29]
Post-2011
2nd Generation
AMD Financial Analyst Day 2010[30] revealed the 2nd generation is scheduled for 2012; AMD referred to this generation as Enhanced Bulldozer. This later generation of Bulldozer core is codenamed Piledriver, and is intended for specific desktop and notebook markets:
- Desktop Performance market (Volan platform[31]): Zambezi's replacement is Vishera, with up to 8 cores; with Turbo Core 3.0 while using the existing Socket AM3+ format and 9xx series chipset of the 1st generation FX-series Zambezi processor. AMD says that this 2nd-generation FX-series processor, code-named Piledriver, would offer up to 20% to 30% better performance increase under digital media workloads.Piledriver
- Desktop Budget and Mainstream market (Virgo platform[32]): The Stars-based Llano Fusion APU line replacement is 2- to 4-core Socket FM2 Trinity, Weatherford, and Richland Fusion APUs, selling at various price points in the desktop market.[33]
- Notebook Mainstream and Performance market (Comal platform[34]): the same as mentioned in Desktop Budget/Mainstream market.
At AMD Fusion Developer Summit (AFDS) 2011, AMD said that the computational capacity of the notebook variant of Trinity would be 50% faster than Llano.[35][36][37]
For the server market, two versions were known to be under development as of November 2011[update][38][39]:
- Cost-effective, energy efficient server (1 to 2 CPUs) market: Opteron 4200-series (Valencia; 6 or 8 cores) will be replaced by Sepang (up to 10 cores). Sepang will be using a socket format called C2012. The memory controller will support triple-channel DDR3 memory configuration, and will have PCI Express 3.0 controller support.
- Enterprise and Mainstream server (2 to 4 CPUs) market: Opteron 6200-series (Interlagos; 8, 12, and 16 cores) will be replaced by Terramar (up to 20 cores). Terramar will be using a socket format called G2012. Like Sepang, it will also have a PCI Express 3.0 controller. But differ by supporting quad-channel DDR3 memory configuration.
3rd Generation
As of 2011[update] AMD mentioned (by name) a 3rd generation Bulldozer-based line for 2013.[38], with working title Next Generation Bulldozer, on the 22 nm FD-SOI manufacturing process.[40]
On 21 September 2011, leaked AMD slides indicated this 3rd generation of Bulldozer core was codenamed Steamroller[41][42] and would be incorporated into specific desktop and notebook markets:
- Desktop Budget and Mainstream market (??? platform): The Trinity Fusion APU line will be replaced by Kaveri Fusion APU line as the 3rd generation A8-, A6-, and A4-series for the desktop market.
- Notebook Mainstream and Performance market (Indus platform): Will be the same as mentioned in Desktop Budget/Mainstream market. The FCH chipset will be codenamed Bolton.
For the server market, two versions were planned[43]:
- Cost-effective, energy efficient server (1 to 2 CPUs) market: Opteron 4200-series Sepang (up to 10 cores) to be replaced by Macau (up to 10 cores), re-using the C2012 socket format.
- Enterprise and Mainstream server (2 to 4 CPUs) market: Opteron 6200-series Terramar (up to 20 cores) to be replaced by Dublin (up to 20 cores), re-using the G2012 socket format.
4th Generation
On 12 October 2011, AMD revealed Excavator to be the codename for the 4th generation Bulldozer core, scheduled for 2014 release.[44]
Reported problems
Some websites have reported that there may be a BIOS issue with several motherboards based on the Socket AM3+ platform, causing the new AMD FX-series CPUs to downclock when the CPU exceeds its 125w thermal limit. AMD has been working closely with ASUS to try and rectify this thermal down-clocking issue. Other motherboard manufactures including but not limited to Gigabyte and MSI are also hard at work, trying to correct this issue by way of a modified BIOS.[45]
See also
References
- ^ http://www.techpowerup.com/138328/Bulldozer-50-Faster-than-Core-i7-and-Phenom-II.html
- ^ "AMD64 Architecture Programmer’s Manual Volume 6: 128-Bit and 256-Bit XOP, and FMA4 Instructions". AMD. May 1, 2009. http://support.amd.com/us/Processor_TechDocs/43479.pdf. Retrieved 2009-05-08.
- ^ "Striking a balance". Dave Christie, AMD Developer blogs. May 7, 2009. http://forums.amd.com/devblog/blogpost.cfm?threadid=112934&catid=208. Retrieved 2009-05-08.
- ^ "AMD Sets New Mark in x86 Innovation with First Detailed Disclosures of Two New Core Designs". AMD. August 24, 2011. pp. 1. http://www.amd.com/us/press-releases/pages/amd-x86-innovation-new-core-designs-2010aug24.aspx. Retrieved September 18, 2011.
- ^ "Analyst Day 2009 Summary". AMD. November 11, 2009. http://www.amd.com/us/press-releases/Pages/amd-analyst-day-2009nov11.aspx. Retrieved 2009-11-14.
- ^ Planet 3DNow! - Das Online-Magazin für den AMD-User
- ^ "Analyst Day 2009 Presentations". AMD. November 11, 2009. http://phx.corporate-ir.net/phoenix.zhtml?c=74093&p=irol-analystday. Retrieved 2009-11-14.
- ^ "Bulldozer microarchitecture block". AnandTech. August 24, 2010. http://images.anandtech.com/reviews/cpu/amd/hotchips2010/bulldozeruarch.jpg.
- ^ "Bulldozer module functional schematic". AMD. August 24, 2010. http://www.xbitlabs.com/images/news/2010-08/bulldozer_3_aug2010.png.
- ^ More On Bulldozer
- ^ AMD Reveals Details About Bulldozer Microprocessors.
- ^ AMD's Bulldozer Microarchitecture
- ^ "Bulldozer design power efficiency". AMD. August 24, 2010. http://images.anandtech.com/reviews/cpu/amd/hotchips2010/bulldozerefficient.jpg.
- ^ a b c Paper abstracts of the ISSCC 2011 conference
- ^ XOP and FMA4 Instruction set in SSE5
- ^ AMD Financial Analyst Day 2010, Server Platforms Presentation
- ^ AMD Roadmap
- ^ http://www.theregister.co.uk/2010/11/15/amd_bulldozer_opteron_rollout/page2.html
- ^ ASUS confirms AM3+ compatibility on AM3 boards
- ^ MSI confirms AM3+ compatibility on AM3 boards
- ^ AM3 processors will work in the AM3+ socket, but Bulldozer chips will not work in non-AM3+ motherboards
- ^ "AMD Ships First "Bulldozer" Processors". http://finance.yahoo.com/news/AMD-Ships-First-Bulldozer-iw-1483835751.html?x=0.
- ^ http://www.cpu-world.com/CPUs/Bulldozer/index.html
- ^ "What Is Bulldozer?". http://blogs.amd.com/work/2010/08/02/what-is-bulldozer/.
- ^ Unlock Your Record Setting AMD FX Series Processor Today
- ^ http://www.xbitlabs.com/articles/cpu/display/amd-fx-8150_13.html#sect0
- ^ http://www.tomshardware.com/reviews/fx-8150-zambezi-bulldozer-990fx,3043-3.html Tom's Hardware review"
- ^ http://www.xbitlabs.com/articles/cpu/display/amd-fx-8150_14.html#sect0
- ^ Our Take on AMD FX
- ^ AMD financial analyst day 2010 press kit
- ^ http://www.xbitlabs.com/news/cpu/display/20110906193303_AMD_Cancels_Next_Gen_Komodo_Processor_Corona_Platform_in_Favour_of_New_Chips.html
- ^ http://www.bit-tech.net/hardware/cpus/2011/06/14/amd-reveals-2012-roadmap/1
- ^ http://mb.zol.com.cn/240/2405453.html
- ^ http://www.donanimhaber.com/islemci/haberleri/AMDnin-2012-icin-planladigi-yeni-nesil-Fusion-platformlari-detaylandi.htm
- ^ http://www.xbitlabs.com/news/cpu/display/20110614211754_AMD_s_Trinity_to_Be_at_Least_50_Faster_than_Llano_Company.html
- ^ http://www.hardware.fr/news/11647/afds-50-trinity-10-tflops-2020.html
- ^ http://www.brightsideofnews.com/news/2011/6/15/amd-demonstrates-trinity2c-promises-10tflops-apu-by-2020.aspx
- ^ a b http://www.xbitlabs.com/news/cpu/display/20101109113213_AMD_Plans_to_Release_Twenty_Core_Microprocessor_in_2012.html
- ^ http://blogs.amd.com/work/fadcodenames/
- ^ http://www.eetimes.com/design/eda-design/4217997/The-next-transistor--planar--fins--and-SoI-at-22nm
- ^ http://prohardver.hu/hir/amd_hosszutavu_mobil_utiterv.html
- ^ http://www.xtremehardware.it/news/hardware/nuove-roadmap-amd-sulle-future-apu-in-programma-nel-2012-e-nel-2013-per-il-mercato-mobile-201109215761/
- ^ http://www.inpai.com.cn/doc/hard/154678.htm
- ^ http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested
- ^ BiosBug1
External links
|
|
Discontinued |
pre-x86 |
|
|
x86-16 (16 bit) |
|
|
x86-32/IA-32 (32 bit) |
|
|
x86-64/AMD64 (64 bit) |
|
|
RISC |
|
|
|
Current |
|
|
Lists |
|
|
Microarchitectures |
|
|